33 research outputs found
On the Differentiability of the Solution to Convex Optimization Problems
In this paper, we provide conditions under which one can take derivatives of
the solution to convex optimization problems with respect to problem data.
These conditions are (roughly) that Slater's condition holds, the functions
involved are twice differentiable, and that a certain Jacobian matrix is
non-singular. The derivation involves applying the implicit function theorem to
the necessary and sufficient KKT system for optimality
InterpNET: Neural Introspection for Interpretable Deep Learning
Humans are able to explain their reasoning. On the contrary, deep neural
networks are not. This paper attempts to bridge this gap by introducing a new
way to design interpretable neural networks for classification, inspired by
physiological evidence of the human visual system's inner-workings. This paper
proposes a neural network design paradigm, termed InterpNET, which can be
combined with any existing classification architecture to generate natural
language explanations of the classifications. The success of the module relies
on the assumption that the network's computation and reasoning is represented
in its internal layer activations. While in principle InterpNET could be
applied to any existing classification architecture, it is evaluated via an
image classification and explanation task. Experiments on a CUB bird
classification and explanation dataset show qualitatively and quantitatively
that the model is able to generate high-quality explanations. While the current
state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a
much higher METEOR score of 37.9.Comment: Presented at NIPS 2017 Symposium on Interpretable Machine Learnin
Active Robotic Mapping through Deep Reinforcement Learning
We propose an approach to learning agents for active robotic mapping, where
the goal is to map the environment as quickly as possible. The agent learns to
map efficiently in simulated environments by receiving rewards corresponding to
how fast it constructs an accurate map. In contrast to prior work, this
approach learns an exploration policy based on a user-specified prior over
environment configurations and sensor model, allowing it to specialize to the
specifications. We evaluate the approach through a simulated Disaster Mapping
scenario and find that it achieves performance slightly better than a
near-optimal myopic exploration scheme, suggesting that it could be useful in
more complicated problem scenarios
A Matrix Gaussian Distribution
In this note, we define a Gaussian probability distribution over matrices. We
prove some useful properties of this distribution, namely, the fact that
marginalization, conditioning, and affine transformations preserve the matrix
Gaussian distribution. We also derive useful results regarding the expected
value of certain quadratic forms based solely on covariances between matrices.
Previous definitions of matrix normal distributions are severely
under-parameterized, assuming unrealistic structure on the covariance (see
Section 2). We believe that our generalization is better equipped for use in
practice.Comment: 4 page
Optimizing for Generalization in Machine Learning with Cross-Validation Gradients
Cross-validation is the workhorse of modern applied statistics and machine
learning, as it provides a principled framework for selecting the model that
maximizes generalization performance. In this paper, we show that the
cross-validation risk is differentiable with respect to the hyperparameters and
training data for many common machine learning algorithms, including logistic
regression, elastic-net regression, and support vector machines. Leveraging
this property of differentiability, we propose a cross-validation gradient
method (CVGM) for hyperparameter optimization. Our method enables efficient
optimization in high-dimensional hyperparameter spaces of the cross-validation
risk, the best surrogate of the true generalization ability of our learning
algorithm.Comment: 11 page
Stochastic Control with Affine Dynamics and Extended Quadratic Costs
An extended quadratic function is a quadratic function plus the indicator
function of an affine set, that is, a quadratic function with embedded linear
equality constraints. We show that, under some technical conditions, random
convex extended quadratic functions are closed under addition, composition with
an affine function, expectation, and partial minimization, that is, minimizing
over some of its arguments. These properties imply that dynamic programming can
be tractably carried out for stochastic control problems with random affine
dynamics and extended quadratic cost functions. While the equations for the
dynamic programming iterations are much more complicated than for traditional
linear quadratic control, they are well suited to an object-oriented
implementation, which we describe. We also describe a number of known and new
applications.Comment: 46 pages, 16 figure
A Distributed Method for Fitting Laplacian Regularized Stratified Models
Stratified models are models that depend in an arbitrary way on a set of
selected categorical features, and depend linearly on the other features. In a
basic and traditional formulation a separate model is fit for each value of the
categorical feature, using only the data that has the specific categorical
value. To this formulation we add Laplacian regularization, which encourages
the model parameters for neighboring categorical values to be similar.
Laplacian regularization allows us to specify one or more weighted graphs on
the stratification feature values. For example, stratifying over the days of
the week, we can specify that the Sunday model parameter should be close to the
Saturday and Monday model parameters. The regularization improves the
performance of the model over the traditional stratified model, since the model
for each value of the categorical `borrows strength' from its neighbors. In
particular, it produces a model even for categorical values that did not appear
in the training data set.
We propose an efficient distributed method for fitting stratified models,
based on the alternating direction method of multipliers (ADMM). When the
fitting loss functions are convex, the stratified model fitting problem is
convex, and our method computes the global minimizer of the loss plus
regularization; in other cases it computes a local minimizer. The method is
very efficient, and naturally scales to large data sets or numbers of
stratified feature values. We illustrate our method with a variety of examples.Comment: 37 pages, 6 figure
Learning Probabilistic Trajectory Models of Aircraft in Terminal Airspace from Position Data
Models for predicting aircraft motion are an important component of modern
aeronautical systems. These models help aircraft plan collision avoidance
maneuvers and help conduct offline performance and safety analyses. In this
article, we develop a method for learning a probabilistic generative model of
aircraft motion in terminal airspace, the controlled airspace surrounding a
given airport. The method fits the model based on a historical dataset of
radar-based position measurements of aircraft landings and takeoffs at that
airport. We find that the model generates realistic trajectories, provides
accurate predictions, and captures the statistical properties of aircraft
trajectories. Furthermore, the model trains quickly, is compact, and allows for
efficient real-time inference.Comment: IEEE Transactions on Intelligent Transportation System
Optimal Representative Sample Weighting
We consider the problem of assigning weights to a set of samples or data
records, with the goal of achieving a representative weighting, which happens
when certain sample averages of the data are close to prescribed values. We
frame the problem of finding representative sample weights as an optimization
problem, which in many cases is convex and can be efficiently solved. Our
formulation includes as a special case the selection of a fixed number of the
samples, with equal weights, i.e., the problem of selecting a smaller
representative subset of the samples. While this problem is combinatorial and
not convex, heuristic methods based on convex optimization seem to perform very
well. We describe rsw, an open-source implementation of the ideas described in
this paper, and apply it to a skewed sample of the CDC BRFSS dataset
Multi-Period Liability Clearing via Convex Optimal Control
We consider the problem of determining a sequence of payments among a set of
entities that clear (if possible) the liabilities among them. We formulate this
as an optimal control problem, which is convex when the objective function is,
and therefore readily solved. For this optimal control problem, we give a
number of useful and interesting convex costs and constraints that can be
combined in any way for different applications. We describe a number of
extensions, for example to handle unknown changes in cash and liabilities, to
allow bailouts, to find the minimum time to clear the liabilities, or to
minimize the number of non-cleared liabilities, when fully clearing the
liabilities is impossible